2 |
The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.4
|
|
|
|
BASE
|
|
Show details
|
|
4 |
The Twitter user dataset for discriminating between Bosnian, Croatian, Montenegrin and Serbian Twitter-HBS 1.0
|
|
|
|
BASE
|
|
Show details
|
|
8 |
The news dataset for discriminating between Bosnian, Croatian and Serbian SETimes.HBS 1.0
|
|
|
|
Abstract:
The SETimes.HBS dataset consists of parallel documents written in Bosnian, Croatian and Serbian, harvested from the already inactive setimes.com website publishing news in the languages of South-Eastern Europe. While the writing process of the documents is not known, they are quite likely independent translations from English. The main intended usage of this dataset is closely-related language discrimination. This dataset is not a traditional parallel dataset as there are no explicit links between parallel documents. Special care was taken that the training, development and testing bins of the dataset contain the same documents in all three languages as data leakage between the three bins, given the similarity of the three languages, could be problematic for benchmarking.
|
|
Keyword:
closely related languages; language identification; news corpus
|
|
URL: http://hdl.handle.net/11356/1461
|
|
BASE
|
|
Hide details
|
|
9 |
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.3
|
|
|
|
BASE
|
|
Show details
|
|
10 |
The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Retweet communities reveal the main sources of hate speech
|
|
|
|
In: PLoS One (2022)
|
|
BASE
|
|
Show details
|
|
14 |
The ParlaMint corpora of parliamentary proceedings
|
|
|
|
In: Lang Resour Eval (2022)
|
|
BASE
|
|
Show details
|
|
18 |
Choice of plausible alternatives dataset in Croatian COPA-HR
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Croatian corpus of non-professional written language by typical speakers and speakers with language disorders RAPUT 1.0
|
|
|
|
BASE
|
|
Show details
|
|
20 |
The Orange workflow for observing collocation trends ColTrend 1.0
|
|
|
|
BASE
|
|
Show details
|
|
|
|